Introduction

The technological development furthers with every year. This report analyses the IT labor market based on surveys made in 2018 and 2023 addressed to everyone who is connected with coding. The goal is to understand who are most coders, how much to they earn and what motivates them to changing their job.

Visualizations

Where do we have our answers from?

As we can see, majority of answers we’re going to analyze comes from countries seen below. We can see that most answers comes from English speaking countries and those tech-oriented like Germany. Poland places itself on 9th place, which can be a nice surprise to see our country so high in the ranking. It means we have a lot of people who are passionate about coding and high potential.

df <- separate_rows(dwak18, Country, sep = ";")
value_counts <- table(df$Country)
counts_df <- data.frame(Country = names(value_counts), count = as.numeric(value_counts))
top_20_countries <- counts_df %>%
  arrange(desc(count)) %>%
  head(20)

gg <- ggplot(top_20_countries, aes(x = reorder(Country, -count), y = count, fill = count)) +
  geom_bar(stat = "identity") +
  labs(title = "Top 20 Countries by Survey Responses",
       x = NULL, y = "# of responses") +
  scale_fill_viridis_c() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
plotly_plot <- ggplotly(gg)
plotly_plot

How our responders are connected with coding?

Plot below presents number of people that took part in the survey and how they are connected to coding. As we can see the respondents are mainly developers by profession. There is very little answers from people who are still learning or code occasionally.

df <- separate_rows(df23, MainBranch, sep = ";")
value_counts <- table(df$MainBranch)
counts_df <- data.frame(MainBranch = names(value_counts), count = as.numeric(value_counts))
counts_df <- counts_df %>%
  mutate(MainBranch = recode(MainBranch,
                             'I am a developer by profession' = 'I am a developer by profession',
                             'I am not primarily a developer, but I write code sometimes as part of my work/studies' = 'I am not a developer, but i write code as a part of work/studies',
                             'I am learning to code' = 'I am learning to code',
                             'I code primarily as a hobby' = 'I code primarily as a hobby',
                             'I used to be a developer by profession, but no longer am' = 'I used to be a developer by profession, but no longer am',
                             'None of these' = 'None of these'
                             ))
counts_df <- counts_df %>%
  arrange(desc(count))

gg <- ggplot(counts_df, aes(x = reorder(MainBranch, -count), y = count, fill = MainBranch)) +
  geom_bar(stat = "identity") +
  labs(title = "Bar Plot of people taking part in survey",
       x = NULL, y = "# of responses") +
  scale_fill_viridis_d() +
  theme_minimal() +
  theme(axis.text.x = element_blank()) 
plotly_plot <- ggplotly(gg)
plotly_plot

What position do our responders have?

The plot presents number of each type of developers that took part in the 2018 survey. As expected, we can see that back-end, full-stack and front-end are the most popular types of developers. We can see that there are 14.000 people with aligned interests to ours, that is Data. Not a lot in comparison to 53.000 of back-end developers only.

df <- separate_rows(dwak18, DevType, sep = ";")
value_counts <- table(df$DevType)
counts_df <- data.frame(DevType = names(value_counts), count = as.numeric(value_counts))
counts_df$DevType <- factor(counts_df$DevType, levels = counts_df$DevType[order(-counts_df$count)])
gg <- ggplot(counts_df, aes(x = DevType, y = count, fill = DevType)) +
  geom_bar(stat = "identity") +
  labs(title = "Bar Plot of developer type popularity",
       x = NULL, y = "Count") +
  scale_fill_viridis_d() + 
  theme_minimal() +  
  theme(axis.text.x = element_blank()) 
plotly_plot <- ggplotly(gg)
plotly_plot

Who earns the most money?

We checked how much mean salary our developers earn based on their coding experience. The results aren’t surprising, as the best paid have 30+ years of experience. Those will be the seniors, CEOs and generally people who most likely will run the projects, have their own companies. It is expected they will earn the most. We can generally observe that the less experience a worker has, the less they will earn. As expected. Although we can also observe that 20-30 years of experience will give you the same range of salary. Only below 18 years it slowly drops.

dw18 <- subset(dwak18, select = c(YearsCoding, ConvertedSalary))

# Calculate median of all salaries
median_of_all <- mean(na.omit(dw18$ConvertedSalary))
std_of_all <- sd(na.omit(dw18$ConvertedSalary))

# Calculate median salary
median_salary <- aggregate(ConvertedSalary ~ YearsCoding, data = dw18, FUN = median)
median_salary$avg <- round((median_salary$ConvertedSalary - median_of_all) / std_of_all, 2)
median_salary$decision <- ifelse(median_salary$avg < 0, "below", "above")  # Check against normalized average
median_salary <- median_salary[order(median_salary$avg), ]
median_salary$YearsCoding <- factor(median_salary$YearsCoding, levels=unique(median_salary$YearsCoding))
max_abs <- max(abs(c(median_salary$avg, 0)))

plot <- ggplot(median_salary, aes(x = reorder(YearsCoding, -ConvertedSalary), y = ConvertedSalary, fill = YearsCoding)) +
  geom_bar(stat = "identity") +
  labs(title = "Mean Salary by Experience",
       x = "Experience",
       y = "Mean Salary [monthly rate]") +
  scale_fill_viridis_d() + 
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE))

pp <- ggplotly(plot)
pp

Average salary comparison

Here we have comparison to median. We can how much experience do you need in order to earn median. Most of workers aren’t close to the median of 9.500.

gg <- ggplot(median_salary, aes(x = avg, y = YearsCoding, color = decision)) +
  geom_segment(aes(xend = 0, yend = YearsCoding), size = 1.5) +
  geom_point(size = 3.5, color = "darkgrey") +
  geom_vline(xintercept = 0, color = "black", linetype = "dashed") +
  scale_color_manual(name = "Salary",
                     labels = c("Below Average", "Above Average"),
                     values = c("below" = "#f8766d", "above" = "#00ba38")) +
  labs(title = "Salary average based on coding experience",
       x = "Below/Above average salary", y = "Experience in coding") +
  theme_minimal() +
  theme(legend.position = "bottom") +
  xlim(-max_abs, max_abs)

plotly_plot <- ggplotly(gg)
plotly_plot

What motivates developers most to change job?

Plot presents the most important attributes for dissatisfied and satisfied workers looking for a new job. We can see that no matter the satisfaction in current job, the most important thing is professional development and languages/frameworks the job offers to work in. What can be surprising to some, compensation is only 3rd on the list. But we need to keep in mind that the respondents are from wide range of ages and stages in life. As expected for this field of work, diversity is the least important fature.

#PLOT FOR THE MOST IMPORTANT FEATURES IN (DIS)SATISFIED WORKERS


#Get data, explore
dissatisfied_searching <- dwak18[(dwak18$JobSatisfaction %in% c("Extremely dissatisfied", "Slightly dissatisfied", "Moderately dissatisfied")) & 
                                   (dwak18$JobSearchStatus == "I am actively looking for a job") & 
                                   !is.na(dwak18$JobSatisfaction) & !is.na(dwak18$JobSearchStatus) & !is.na(dwak18$AssessJob1), ]
satisfied_searching <- dwak18[(dwak18$JobSatisfaction %in% c("Extremely satisfied", "Slightly satisfied", "Moderately satisfied")) & 
                                   (dwak18$JobSearchStatus == "I am actively looking for a job") & 
                                   !is.na(dwak18$JobSatisfaction) & !is.na(dwak18$JobSearchStatus) & !is.na(dwak18$AssessJob1), ]
assesjob <- dissatisfied_searching[, c('AssessJob1', 'AssessJob2', 'AssessJob3', 'AssessJob4', 'AssessJob5', 'AssessJob6', 'AssessJob7', 'AssessJob8', 'AssessJob9', 'AssessJob10')]
assesjob2 <- satisfied_searching[, c('AssessJob1', 'AssessJob2', 'AssessJob3', 'AssessJob4', 'AssessJob5', 'AssessJob6', 'AssessJob7', 'AssessJob8', 'AssessJob9', 'AssessJob10')]


col_names <- c("Industry", "Financial status of company", "Department/team", "Languages/frameworks", "Compensation and benefits", "Company culture", "Work from home/remotely", "Professional development", "The diversity", "Impact of the product/service")
counts <- numeric(length(assesjob))
counts2 <- numeric(length(assesjob2))

# Calculate counts for 2018 and 2023
for (i in seq_along(assesjob)) {
  counts[i] <- sum(assesjob[[i]] == 1)
  counts2[i] <- sum(assesjob2[[i]] == 1)
} 


most_important <- data.frame(Name = col_names, Dissatisfied = counts, Satisfied = counts2)
most_important <- most_important[order(-most_important$Dissatisfied), ]
most_important$Name <- factor(most_important$Name, levels = most_important$Name)
data_long <- tidyr::pivot_longer(most_important, cols = c(Satisfied, Dissatisfied), names_to = "Year", values_to = "Value")

# Create ggplot
gg <- ggplot(data_long, aes(x = Value, y = Name, fill = Year)) +  # Exchange x with y axis
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Comparison of most important features in potential \nnew job according to satisfied vs dissatisfied developers", 
       x = "# of votings as most important", y = "Feature") +  # Adjust x and y axis titles
  theme_bw() +
  theme(axis.text.y = element_text(hjust = 1, size = 11),  # Adjust x-axis label font size and rotation
        axis.text.x = element_text(size = 13),  # Adjust y-axis label font size
        axis.title.y = element_text(size = 15),  # Adjust x-axis title font size
        axis.title.x = element_text(size = 15),  # Adjust y-axis title font size
        plot.title = element_text(size = 12, hjust = 0.5),
        legend.text = element_text(size = 12)) +
  coord_cartesian(xlim = c(0, 750)) +  # Adjust x-axis limit
  scale_fill_manual(values = c(viridisLite::viridis(4)[c(4, 2)]))  # Set custom fill colors
plotly_plot <- ggplotly(gg)
plotly_plot